Decision Tree | Tree Ensemble | XGBoost - utkaln/machine-learning GitHub Wiki
Decision Tree Basics
- Classify based on features that intermittently cluster similar items together, eventually driving to final classification
- Select the feature that can quickly Narrow down to decisive results
Decision 1 : Explore Features
- A good feature is the one that helps arrive at pure form quicker
Decision 2: When to stop splitting
- When further splitting does not improve beyond minimum threshold
Concept of Entropy
- Measurement of impurity
- p0 and p1 are the most pure. p0.5 is the most impure
- Equation :
H(p1) = -p1 * log2(p1) - p0 * log2(p0)
How to choose which split to go to
- This is driven by a concept called Information Gain which is calculated using the concepts of Entropy
- At a Decision Node, a weighted average of probability of left node and right node is calculated. The decision tree that shows the biggest difference from the probability at the root is the most preferred split to go to
Illustration below:
Summary of Decision Tree Flow
- Start with all examples at Root Node
- Calculate Information Gain on all possible features and choose the one with highest gain
- Split data according to the selected feature
- Keep Repeating Splitting until -
- When a node has 100% of the class
- When Information Gain less than Threshold value
- When Splitting a result will Exceed Maximum Depth of Tree Decided
- When Number of Examples in a node is Below Threshold
Tree Ensemble
- Multiple independent decision trees are used to make a decision based on majority
- This is achieved by creating sample with replacement
- The optimal number of rounds of creating sample is about 100. Larger sample usually does not provide any significant higher accuracy
XGBoost
- Full form: eXtreme Gradient Boosting
- Algorithm that works with
Tree Ensemble
by providing more focus on the misclassified predictions from previous round - It reduces error using the following mechanism - In contrast to
Random Forest
which creates unrelated decision trees, XGBoost creates trees fitting one after the other to minimize the error
Decision Tree Vs. Neural Network
Decision Tree | Neural Network |
---|---|
Works well on Structured Data | Works well on Structured and Unstructured Data |
Recommended for Tabular Data | Recommended for Speech, Text, Video type Data |
Faster Processing | Slower Processing |
Human Interpretable | Not Easy for Humans to Interpret |
Can't leverage Transfer Learning | Transfer Learning can help improve accuracy |
Mostly works as one model for one system | Multiple Networks can be strung together in a System to build with multiple models |